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ABSTRACT 

We generalize the technique of fringe-rate filtering, whereby visibilities measured by a radio inter¬ 
ferometer are re-weighted according to their temporal variation. As the Earth rotates, radio sources 
traverse through an interferometer’s fringe pattern at rates that depend on their position on the sky. 
Capitalizing on this geometric interpretation of fringe rates, we employ time-domain convolution ker¬ 
nels to enact fringe-rate filters that sculpt the effective primary beam of antennas in an interferometer. 
As we show, beam sculpting through fringe-rate filtering can be used to optimize measurements for 
a variety of applications, including mapmaking, minimizing polarization leakage, suppressing instru¬ 
mental systematics, and enhancing the sensitivity of power-spectrum measurements. We show that 
fringe-rate filtering arises naturally in minimum variance treatments of many of these problems, en¬ 
abling optimal visibility-based approaches to analyses of interferometric data that avoid systematics 
potentially intro duced by traditio nal approaches such as imaging. Our techniques have recently been 
demonstrated in Ali et al. (2015), where new upper limits were placed on the 21cm power spectrum 
from reionization, showcasing the ability of fringe-rate filtering to successfully boost sensitivity and 
reduce the impact of systematics in deep observations. 


1. INTRODUCTION 

In recent years, low-frequency radio interferometers 
have undergone dramatic changes in design. These trans¬ 
formations have been driven by new science applications 
such as 21 cm cosmology, where one uses the highly red- 
shifted emission from the 21cm hyperfine transition of 
neutral hydrogen to map our early Universe. Observers 
in 21cm cosmology seek to measure small fluctuations 
(both spatially and spectrally) in a dim, diffuse back¬ 
ground that is obscured by bright foreground emission 
orders of magnitude brighter in brightness temperature. 
This stands in contrast to many traditional observations 
in radio astronomy, which more usually target bright, 
compact objects in front of a dim background, often over 
a small selection of frequencies. These differences have 
led to the design, construction, and usage of new inter¬ 
ferometers that only have moderate angular resolution, 
but are comprised of a large number of receiving elements 
with wide fields of view operating over a wide bandwidth. 
Examples of new interferometers that broadly fit some 
or all of this description include the Donald C. Backer 
Precision Array for Probing t he Epoch of Reionization 
(PAPER; Parsons et al.||2QlQ), the Murchison Widefield 


Array (MWA; Tingay et al 


the L Ow Erequency ARray (LOFAR; van Haarlem 


013 Bowman et al 


m et al 


2013|) , the Cana dian Hydrogen Intens ity Mapping Exper 
iment (CHIME; |Bandiira et al.|2Q 14[ ), the MIT Epoch o f 
Reionization experiment (IVII'IEl* 


experiment (MI'I'EoR; Zheng et al. 


the Larg e Aperture Experiment to De tect the Par 


2Q14D , 
k Ages 

(LEDA; |Greenhill & Bernardi||2012|), and the Hydro¬ 
gen E poch of Reionization Array (HERA; Pober et al. 
|2014|). Eurther deviating from conventional array de- 
signs, the PAPER, MITEoR, CHIME, HERA projects 
have also maximized sensitivity by choosing to pl ace their 
antenna elements in regular, redundant grids (Parsons 
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et al.||2012a). 

With new interferometer designs, it is natural to expect 
new approaches to data analysis. In this paper, we criti¬ 
cally examine methods for time integration. Integrating 
in time is a crucial step for the high-sensitivity appli¬ 
cations of modern low-frequency radio astronomy. Con¬ 
sider the measurement of the high-redshift 21cm power 
spectrum as an example application. At the relevant red- 
shifts ( 2 ; ^ 6 to 20), theoretical models suggest that this 
cosmological signal will be faint — on the order of 1 mK 
in brightness temperature. The noise power spectrum on 
such measurements reaches comparable magnitudes only 
after long integration (> 1000 hrs) on instruments opti¬ 
mized for such a measurement (iHarker et al.||2QlQ| Par ¬ 
sons et al.|2012a||Beardsley et al.l2013||Pober et al.|2014p, 
and even then, often only lor the largest spatial modes 
(depending on the instrument). Long time-integrations 
are therefore crucial not only for generating the requisite 
sensitivity for a detection of the cosmological signal, but 
also to allow faint systematics to be detected and excised 
from the data. 

, we extend ideas introduc ed in Parsons fc 
well as similar ideas inlRoshi & Perley 

& 


Backer d 

2009) (as well as si 


. Oi 

iringa et al. 

|2012h 

timize the process ol 

: como 


ita. The 

key realization is that fringe-rate—the Eourier dual to 
time—is a more natural space to enact time-averaging. 
Traditional time-averaging (say, a running box-car aver¬ 
age) is equivalent to multiplying by a sine filter in the 
fringe-rate domain. Generalizing this process, the con¬ 
volution theorem ensures that time integration can be 
achieved by weighting the data in the fringe-rate do¬ 
main. The fringe-rate domain provides a natural ba¬ 
sis for time-averaging interferometric data because as¬ 
tronomical sources are locked to the celestial sphere, and 
therefore appear at predictable fringe-rates in the data. 
In particular, for a given interferometric baseline, there 
exists a maximum allowable fringe-rate, beyond which 
there is only instrumental noise. Eringe-rate filtering al- 
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lows the clean elimination of such noise-like modes. 

We place a particular emphasis on the geometric inter¬ 
pretation of fringe-rate filtering, where weightings in the 
fringe-rate domain result in changes to an interferome¬ 
ter’s spatial response, effectively allowing different por¬ 
tions of the sky to be selected by carefully chosen fringe- 
rate filters. These filters can be optimized for a number 
of different applications, including the measurement of 
cosmological power spectra, the reduction of polariza¬ 
tion leakage, and the downweighting of contaminating 
sources far from the central regions of the sky that one 
is attempting to observe. Importantly, these filters can 
be implemented on a per-baseline basis, providing a dif¬ 
ferent view of systematics in the data, which are often 
easier to identify when described baseline-by-baseline, in¬ 
stead of being mixed together in an image-domain map. 
However, we will also show that optimally weighted map¬ 
making can also be more conveniently conceptualized in 
a mathematical framework operating in the fringe-rate 
basis. 

The rest of the paper is organized as follows. In Sec¬ 
tion we provide a general overview of fringe-rate fil¬ 
tering, establishing an essential geometric intuition for 
the process. The specific implementation that we use for 
the simulations in this paper are described in Section 
Section describes how fringe-rate filtering can be opti¬ 
mized for various applications. We pay specific attention 
to the problem of mapmaking in Section and show 
that fringe-rate filtering arises naturally in that context 
as well. We summarize our conclusions in Section |6l 


2 . OVERVIEW OE PRINCIPLE OE ERINGE-RATE 
EILTERING 


Generally, the interferometric response V at frequency 
ly for two antennas in a radio interferometer is described 
by the visibility functiorj^ 


/' 


Vhv{t) = I dQ ex.p —i27r-b(t) • r 


( 1 ) 


where Ijy is the specific intensity of the sky in the direc¬ 
tion f. Ay is the geometric mean of the primary beam 
power patterns of the constituent antennas (henceforth 
known as “the primary beam”), h{t) is the baseline 
vector separating the two antennas in question (which 
is time-dependent since the baselines rotate with the 
Earth), and u is the spectral frequency. We adopt the 
convention that our coordinate system is fixed to the ce¬ 
lestial sphere, because it will be convenient for our alge¬ 
braic manipulations later. However, it is equally valid to 
understand the time-variation of the visibilities as aris¬ 
ing from the movement of astronomical sources through 
the primary beam and the fringes arising from a base¬ 
line, which are fixed to a topocentric coordinate system. 
Eor drift-scan telescopes like PAPER, GHIME, or HERA, 
this view is particularly powerful because then the pri¬ 
mary beam and the fringe pattern are locked to one 
another, and may together be considered an enveloped 
fringe pattern that gives rise to time-variation in 
as the Earth rotates. 

The rate at which angular structure on the sky moves 
relative to the fringe pattern—the fringe rate —depends 


In this section, we omit the instrumental noise contribution to 
the measured visibilities in order to avoid notational clutter. 



Fig. 1.— The fringe pattern at 150 MHz of a fiducial 30-m 
east-west baseline, overlaid with arrows indicating the distance tra¬ 
versed by sources at various declinations over a two-hour time span 
centered at transit. In a fixed time interval, sources near declina¬ 
tion (5 = 0° traverse more fringe periods than sources nearer to the 
celestial poles. This gives rise to different fringe rates that can be 
used to distinguish sources in a time-series measured with a single 
baseline. 


on the declination and hour angle. As an example, Eigure 

t illustrates the real component of the phase variation in 
e fringe pattern of a 30-m east-west baseline deployed 
at —30° latitude. Although fringes are evenly spaced in 
I = sin^a,, the distance a source that is locked to the ce¬ 
lestial sphere travels through the fringe pattern depends 
on its position on the sphere. This is illustrated in Eig¬ 
ure by arrows that indicate the motion of sources at 
differing declinations over the course of two hours near 
transit. As the source and the fringe pattern move rel¬ 
ative to one another, Vy{t) oscillates with an amplitude 
that is determined by the strength of the source and the 
amplitude of the beam response, and a frequency that 
corresponds to the number of fringe periods traversed in 
a given time interval. Hence, the frequency or fringe-rate 
of oscillations in Vy{t) ranges from a maximum at ^ = 0° 
to zero at = —90°, and can even become negative for 
emission from the far side of the celestial pole. 

Let us now derive this intuition mathematically, as¬ 
suming a drift-scan telescope. To sort our time-variable 
visibilities into different fringe-rates /, we take the 
Eourier transform of our visibility over a short interval 
of time centered at time t to get 

Vbu{f,t)= J dni^{r)J -‘)/+7 

( 2 ) 

where we have introduced the notation = b(t), and 7 
is a tapering function for the Eourier transform in time, 
which we assume peaks when its argument is zero, in 
essence shifting the origin of our transform to time t. If 
the characteristic width of 7 is relatively short, one is 
effectively examining the visibility over short timescales, 
during which its time-dependence will likely be domi¬ 
nated by features on the sky moving relative to fringes. 
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and not the movement of the primary beam through the 
celestial sphere. We may therefore say that for short pe¬ 
riods of time, ^ Ay{YA)' Additionally, we may 

take the time-dependence of the baselines to leading or¬ 
der, with 






dt 


t'=t 


— bt — (bt X 0 ^ 0 )(t' — t) + ... 


( 3 ) 


where u)^ is the angular velocity vector of the Earth’s 
rotation. In the last equality, we used the fact that the 
time-dependence of the baselines are not arbitrary, but 
instead are tied to the Earth’s rotation, transforming the 
time derivative into a cross-product with 0 ^ 0 , as one does 
in the analysis of solid rotating bodies. Inserting these 
approximations into Equation Q yields 


— i27r^bfr 


V bu (/, ^) = y 

• J — t)e 


i27r[^(btXW0)-r-/](t'-t) 


-(bt X o)®) -r-/ , (4) 


where 7 is the inverse Eourier transform of 7 . To the 
extent that 7 (t) can be chosen to be relatively broad 
without violating our approximations, 7 will be peaked 
around the point where its argument is zero. Its pres¬ 
ence in Equation © therefore acts approximately like a 
delta function, selecting portions of the sky that have r 
satisfying the condition / ^ f • b x 

In words, what the above derivation shows is that, 
as claimed, different fringe-rates correspond to different 
parts of the sky. This is illustrated in Eigure which 
shows shaded bands of constant fringe-rate for the same 
baseline as the one simulated in Eigure In general, 
contours of constant fringe-rate correspond to locations 
on the sky f that have the same degenerate combination 
of r • u ;0 X b. Note that this combination can also be 
rewritten as • b x r or b • r x by cyclic permuta¬ 
tion. Thus, if any two of b, and r are parallel, their 
cross product—and hence the fringe-rate—will be zero. 
Eor example, the fringe-rate for astronomical sources lo¬ 
cated at either celestial pole will always be zero, since r 
would then be parallel to u; 0 . Similarly, a north-south 
only baseline located at the equator would have b parallel 
to u; 0 , resulting in / = 0 because in such a scenario the 
fringes would have no azimuthal dependence, and thus 
there would be no fringe-crossings as the Earth rotates 
relative to the sky. 

Because different fringe-rates correspond to different 
parts of the sky, we may effectively select different por¬ 
tions of the sky by picking different linear combinations 
of fringe-rates. To see this, imagine decomposing our 
data into fringe-rates, and then applying a weighting 
function w{f) before Eourier transforming back to the 



-0.6-0.4--0.2 0.0 0.2 0.4 0.6 0.8 1.0 

Fringe Rate [mHz] 


Fig. 2. — Fringe rate as a function of sky position, corresponding 
to the fringe pattern illustrated in Figured Fringe rates peak at 
1.09 mHz at (5 = 0°, hit zero at the south celestial pole, and become 
negative on the far side of the pole. Grey shading indicates the 
approximate angular regions that correspond to alternating fringe- 
rate bins, assuming a fringe-rate transform taken over a two-hour 
time series. 


time domain. The result is 


X J X u;®) . f - / 


( 5 ) 


Now, suppose we implement this filter in a sliding manner 
(a convolution) in time. That is, we repeat this process 
with the fringe-rate transform centered on each instant 
in time. With this, we become interested in only t' = t, 
so the final set of filtered visibilities takes the form 


/< 


IC (*) = / dni^{r)Al'^{r,t)exp -i27r-b(i) ■ f 


(6) 


which is precisely the same as our original measurement 
equation, except the primary beam has been replaced by 
an effective primary beam, defined as 


|^-(bt X u)q) ■ r 


( 7 ) 


with * signifying a convolution. We thus see that by ju¬ 
diciously selecting fringe-rate weights, one can effectively 
reshape one’s beam. In general, however, we cannot do 
so with perfect flexibility. This can be seen by once again 
examining the combination r • b x Eor any given in¬ 
stant, b X u)^ picks out a particular direction on the 
celestial sphere. A ring of locations r on the sky at a 
constant angle with respect to this direction will have 
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the same value of r • b x and therefore the same 
fringe-rate. As a result, contours of constant fringe-rate 
always form rings on the sky, as illustrated in Figure 
By weighting different fringe-rates, one can effectively 
“turn off’ (or less harshly, to simply downweight) whole 
contours, but never portions of a contour. 

Aside from modifying the shape of one’s beam, fringe- 
rate filtering can also be used to integrate visibilities in 
time. For example, if w{f) is chosen in a way that sup¬ 
presses high fringe rates, the effect in the time domain 
will be a low-pass filter that (among other features) has 
the rough effect of averaging together data. Enacting the 
time-averaging in the fringe-rate domain is particularly 
helpful for differentiating between noise- and signal-like 
modes in the time-series data. To see this, recall that 
the relative compactness of the 7 term in Equation 0 
implies that an astronomical source located at r will prS- 
erentially appear at a fringe rate of / ~ r • b x uj^vjc 
in the data. Since r • b x u ;0 can never exceed 6 cj 0 , 
the maximal fringe-rate that can be achieved by a source 
locked to the celestial sphere is /max = huo^u/c^ where 
= |u; 0 | and b = |b|. Data at even higher fringe rates 
will likely be noise- rather than signal-dominated and 
may be filtered out safely with no loss of signal. This is 
a more tailored approach to reducing time-ordered data 
than simply averaging visibilities together in time. The 
latter can be viewed as a boxcar convolution in the time 
domain, which corresponds to applying a sine filter in 
fringe-rate space. With wings that only decay as 1//, 
a sine filter tends to incorporate data from the noise- 
dominated high fringe rate modes. A fringe-rate filter, 
in contrast, can be more carefully tailored to enhance 
modes that are sourced by actual emission from the ce¬ 
lestial sphere. 

In this section, we have provided some basic intuition 
for fringe-rate filtering, and have highlighted how it can 
be used for reshaping one’s primary beam as well as to 
combine time-ordered data. In fact, these two applica¬ 
tions are often intimately linked, since optimal prescrip¬ 
tions for combining time-ordered data (“map making”) 
involve re-weighting data by the primary beam ( Tegmark 


1997 Morales & Mateje! 


jek||2QQ9J Di 

Section w 


Dillon et al. 2U15b. We 


will return to this in Section where we will see that 

the fringe-rate filtering is a natural way to approach map¬ 
making in interferometric observations. 

3. IMPLEMENTATION 

In this section, we discuss a practical implementation 
of the aforementioned ideas. We will use this imple¬ 
mentation in simulations later in the paper to illustrate 
various applications of fringe-rate filtering. In keeping 
with its origins as a tool for analyzing data from the 
PAPER array, we simulate a model array based on PA¬ 
PER, deployed at a latitude of —30° and featuring the 
beam res ponse pattern characteristic of PAPER dipole 
elements (Parsons et al.||2008 Pober et al. 2012). The 
PAPER beam response pattern is illustrated in tne left¬ 
most panel of Eigure Eor these simulations, we also 
choose a specific baseline to examine: a pair of antennas 
separated by 30 m in the east-west direction, deployed 
at a latitude of —30°, and observing at 150 MHz. This 
baseline, hereafter referred to as our fiducial baseline^ 
corresponds to the most repeated (and hence, most sen¬ 
sitive) baseline length measured by the PAPER array in 




Fig. 3. — Top: the optimal power-spectrum sensitivity weight¬ 
ing in fringe-rate space for the XX polarization beam of our fidu¬ 
cial baseline (black) is overlaid wit h the simp l e para metrization 
for optimal weighting (red) used in |Ah et al.| ( |2015| >, which ex¬ 
cises fringe rates at risk for contamination by crosstalk, and ap¬ 
plies a Blackman-Harris window in the time domain to generate 
a compact time-domain convolution kernel with regions weighted 
below 5% of the peak response suppressed. Also illustrated are 
the weightings for matching the YY to the XX polarization beam 
(blue dashed) and f or r educing polarization leakage (blue solid) 
described in Section [4.2| as well as a weighting that balances the 
needs of sensitivity, polarization leakage, crosstalk removal, and 
off-axis foregrounds (green). Bottom: the time-domain convolu¬ 
tion kernel corresponding to the red curve in the top panel. Real 
and imaginary components are illustrated in cyan and magenta, 
respectively, with the absolute amplitude illustrated in brown. 


the maximum-redundancy arr ay configuration it uses for 


power spectral m easurements (Parsons et al. 2012a, 2014 


Ali et al. 2015). As such, our simulations demonstrate 


the performance of fringe-rate filtering in the context of 
the specific instrument configuration that has recently 
been used to place the current best upper limits on 21 cm 


emission from cosmic reionization (jParsons et al. 2014 
Jacobs et al.||2014 Ali et al.||2015 ). 


In the following sections, we will derive a number of 
different fringe-rate weights, each optimized for a differ¬ 
ent application. Often, these optimized weights depend 
on the detailed properties of one’s instrument, and can 
therefore only be computed numerical ly, n ot analytically. 
Eor example, we will find in Section |4.1| that the opti¬ 
mal fringe-rate weights for power spectrum estimation 
involve computing the root-mean-square (RMS) primary 
beam profile over contours of constant fringe rate on the 
sky (such as those shown in Eigure]^. An example of 
the binning of the RMS beam response in fringe rate is 
given by the black curve in the top panel of Eigure 
A realistic primary beam will frequently require empiri¬ 
cal modeling beyond analytic forms, making it generally 
difficult to derive a completely analytic expression for an 
optimized fringe-rate filter profile. However, in the inter¬ 
est of being able to rapidly generate filters as a function 
of varying baseline lengths and observing frequencies, we 



































































Fringe-Rate Filtering 


5 


frequently fit analytic forms (such as truncated Gaus- 
sians) to the numerical profiles, as shown by the red curve 
in the top panel of Figure As long as the numerical 
profiles take the optimized forms that we will derive in 
Section small deviations arising from an imperfect an¬ 
alytic fit are unlikely to significantly shift the final error 
properties of one’s measurements. With th e dis cussion 
of power spectrum measurements in Section |4Tj for ex¬ 
ample, we minimize the noise variance by varying the 
fringe-rate weights. Because our analytic fits to these 
weights start from a local minimum in noise variance, 
any deviations in the weighting profile will only induce 
small second-order increases in the final error bars. 

The next step in implementing the fringe-rate filter is 
translating the analytic filter profile in fringe-rate space 
into a time-domain kernel that can be used to convolve 
the simulated time series of visibilities. In effect, we im¬ 
plement the fringe-rate filter as a finite impulse response 
(FIR) filter. The convolution kernel corresponding to 
this FIR filter is shown in the lower panel of Figure 
Applying the fringe-rate filter as an FIR filter in the time 
domain, as opposed to directly multiplying the desired 
filter to Fourier-transformed visibilities, has the advan¬ 
tage that flagged or missing data can be naturally ex¬ 
cluded from the filter by neglecting FIR taps (coefficient 
multiplies) that target the missing data. The summed 
output of the FIR filter are then renormalized to ac¬ 
count for the missing samples. Another advantage of 
the FIR implementation of the fringe-rate filter is the 
potential for windowing the time-domain filter profile. 
While time-domain windowing causes further deviations 
from the ideal fringe-rate filter profile, it can be used 
to produce a more compact time-domain kernel. Re¬ 
ducing the number of time-domain samples used in the 
FIR filter improves the computational efficiency of the 
filter, helps limit the number of samples potentially cor¬ 
rupted by spurious systematics such as radio frequency 
i nterfe rence, and reduces boundary effects. In |Ali et al. 
(|2015|), a 12-hour data set was filtered in this manner, 
with boundary effects limited to an hour on each end. 
The final analysis proceeded on 8.3 hours selected from 
within the region not affected by boundary effects. 

4. APPLICATIONS 

In Section!^ we discussed how a fringe-rate filter can 
be implemented in practice once a particular form for the 
filter is selected. In this section, we optimize the selec¬ 
tion of filters (or equivalently, of fringe-rate weights) for 
various applications in low-frequency radio interferome¬ 
try. The key to this optimization will be the insight from 
Section namely that the effect of fringe-rate filtering 
can be regarded as both a time integration and a modi¬ 
fication of the spatial response of the primary beam on a 
per-baseline basis. Turning this around, one can identify 
the optimal primary beam needed for one’s observations, 
and then reverse engineer the set of fringe-rates needed to 
achieve this beam in what is essentially a “beam sculpt¬ 
ing” operation. For concreteness, we will focus here on 
21 cm cosmology, but many of the ideas presented here 
are easily translatable to other applications of radio in¬ 
terferometry. 

4.1. Minimizing thermal noise errors in power 
speetrum measurements 


In Parsons et al. (2012a) and Parsons et al. (2014), it 
was shown that estimates of the three-dimensional power 
spectrum of 21 cm brightness temperature fluctuations 
could be obtained from a single baseline by Fourier trans¬ 
forming visibility data along the frequency axis (form¬ 
ing a “delay spectrum”), and then taking the absolute 
square of the results. Here, we will show how fringe-rate 
weights can be chosen to maximize the sensitivity of a 
single-baseline-derived power spectrum. 

We begin by considering a generalization of the deriva¬ 
tion in Parsons et al. (2014|), where it was assumed that 
the primary beams of all elements in the interferometer 
are identical. We now consider the possibility of prob¬ 
ing the power spectrum via a cross-correlation of two 
baselines with different primary beams. To be clear, our 
eventual discussion will be based on the analysis of fringe- 
rate filtered visibilities from a single baseline. However, 
from Section we saw that to a good approximation, 
selecting different fringe-rates is equivalent to observing 
the sky with different effective beams. Thus, the cross¬ 
correlation of visibilities from two different fringe-rate 
bins is mathematically identical to cross-correlating two 
baselines with different beams. To begin, suppose that 
the ith baseline consists of antenna elements with pri¬ 
mary beam A^(r,z/). The delay-transformed visibility 
takes the form 


= ^ J (fW dri'Ai{u-u',ri-ri')T{u',ri'), ( 8 ) 

where g is the Fourier dual to frequenc}0 Ai is the 
Fourier transform of A^(f, z^) in both the angular and 

spectral directions, T is the brightness temperature field 
in Fourier space, is Boltzmann’s constant, A is the 
central observation frequency, and b = uA, with u = 
(iz, v). From this, we can see that two baselines with dif¬ 
ferent primary beams, but located at the same location 
on the uv plane have a delay-spectrum cross-correlation 
given by 


(Fi(u,r?)V(u,7?)*) 



(Tu' dri'P{u', rj') 


xAi{u - u',7^ - r]')Aj{u — u'.T] — g') 


~ P(U,7?) 



dftdnAi{T, iy)Aj{T, z^). 


( 9 ) 


where angular brackets (...) denote an ensemble aver¬ 
age over possible realizations of a random temperature 
field. In the first equality, we assumed that this field is a 
translation-invariant Gaussian random field specified by 
a power spectrum P(u,? 7 ), so that 


(T(u,7?)T*(u',7?')) = (10) 

In the second equality, we made the approximation that 


^ This equation can be derived by Fourier transforming Equa¬ 
tion ly along the frequency axis and re-expressing the angular 
integrm in uv coordinates assuming the flat-sky approximation. 
However, it also makes the crucial assumption that one can neglect 
the frequency-dependent nature of the mapping of baseline b to u 
coordinate. In pra ctice, this is only a reasonable approx imation 
for short baselines ([Parsons et al.|2012b| |Liu et al.|2014a|) such as 
those used for power spectrum analyses m the FAFEK e xperiment 
( [Parsons et al.|2014[ [Jacobs et al.|20T4|[Ah et al.|201^ . 
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for reasonably broad primary beams, Ai and Aj tend 
to be rather localized, which allows the comparatively 
broader P(u, rf) to be factored out of the integralj^ Fol¬ 
lowing this, we used Parseval’s theorem to rewrite the 
integral over (u, 77) space as an integral over solid angle 
and frequency. 

Rearranging Equation gives an expression for the 
true power spectrum in terms of the cross-correlation 
function of two delay-space visibilities. With real data, 
however, one cannot perform the ensemble average on 
the left-hand side of Equation Omitting this en¬ 

semble average, the copy of the power spectrum on the 
right-hand side becomes an estimator P of the true power 
spectrum P. Introducing the definition 

= ^ J dnduAi{r,i^)Aj{r,i^), ( 11 ) 

where B is the bandwidth over which observations are 
made, our estimator takes the form 

2 


QPP = - 

ij - 




Wb 


Vi{u,ri)Vj{u,ri)*, 


( 12 ) 


where we have written the power spectrum in terms 
of cosmological Eourier coordinates k, which are re¬ 
lated to the interferometric Eourier coordinates by 
{Xkx^ Xky^Ykz) = 27r(7x, t;, 77), picking up an extra fac¬ 
tor of X^Y in the processQwith 


X = 


-/■ 


dz' 

W) 


Eiz) = + zr + nA. (13) 


where c is the speed of light, z is the redshift of observa¬ 
tion, Hq is the Hubble parameter, is the normalized 
matter density, Qa is the normalized dark energy density, 
and 

- U2iHoE{zy ^ ’ 

where z/21 ~ 1420 MHz is the rest frequency of the 21 cm 
line. In the special case where there is just a single pri¬ 
mary beam, we may set i = j and drop the subscripts 


for brevity, and Equation (12) reduces to 




X^Y 


\Vin,v)f, 


(15) 


where 


^PP — ^ 


t J dndp\A{r,p)\‘^, (16) 


which is the relation found in|Parsons et al. (2014|). 

Having established these results, let us re-mterpret 
Equation as an estimator for the power spectrum 
from the cross-multiplication of two different discretized 
fringe rate bins (as opposed to the cross-multiplication of 
baselines with different primary beams). We are free to 
re-interpret our estimator in this way because of the dis¬ 
cussion in Sectionwhere we showed that each visibility 
could be thought m as being comprised of different fringe 
rate contributions, each probing a different ring on the 


® Although see Section IV A of [Liu et al.| ( |2014b| for some limi¬ 
tations of thi s approximation. 

^ See, e.g., Liu et al. ( 2014a| > for a detailed derivation. 


celestial sphere. Each fringe-rate therefore has its own 
effective primary beam, enabling our re-interpretation. 
That Equation involves the cross-multiplication of 
visibilities after they have been delay-transformed over 
the frequency axis is not a problem, since the Eourier 
transforms required to enact the delay transform and the 
fringe-rate transform commute with one another. 

Equation (12) allows a power spectrum to be estimated 


from the cross-multiplication of any pair of fringe-rate 
bins. To increase signal-to-noise on the measurement, 
however, one ought to form all possible cross-multiplied 
pairs, which can then combined into a single power spec¬ 
trum estimate via a weighted average. Suppressing the 
arguments of P and V for notational cleanliness, we can 
write 

p = (V 


where gij is the weight assigned to the cross- 
multiplication of the 7th and jth fringe-rate bins. Our 
goal is to select weights that minimize the error bars on 
the final power spectrum estimate. 

Eor our optimization exercise, assume that errors are 
due to instrumental thermal noise only. If the ith fringe- 
rate bin has a noise contribution of 74^, the noise contri¬ 
bution to our estimator is 

-fnoise — 


The error bar corresponding to this noise contribution is 
given by the square root of its variance, which takes the 
form 

Var(Pnoise) = (-^noise) ~ (-^noise) 

= E 9ij9km [{nin*nkny) - {riin*){nkuy}] 
ijkm 

= (19) 


where in the last equality we assumed that the noise is 
Gaussian, enabling the fourth moment term to be written 
as a sum of second moment (variance) terms. We further 
assumed that the real and imaginary components of the 
noise are uncorrelated with each other and between dif¬ 
ferent fringe-rate bins, so that if rii = Oi ibi, we have 
{oiOj) = (bibj) = 6ija‘^12 and {oibj) = 0 for all i and j. 

In minimizing the noise variance, care must be taken 
to ensure that there is no signal loss in the power spec¬ 
trum estimation. To do so, we first note that taking the 
ensemble average of P gives 

(P) = ^ gij {ViV*) = SY, 9iPiiP. (20) 

ij ij 


where we used an ensemble-averaged version of Equa¬ 
tion (12) to relate the true cross-correlation to the true 
power spectrum, and defined S = (P/X^T)(2 /cb/A^)^. 
Ensuring that there is no signal loss is thus tantamount 


to requiring that S^-- Qij^ij = 1, so that (P) = P. We 
may impose this constraint by introducing a Lagrange 
multiplier A in our minimization of the noise variance. 
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minimizing 

C, = ^ ^ QijQji ^ ^ ^ 

ij ij 


where both and S have been absorbed into our defi¬ 
nition of A. Differentiating with respect to each element 
and setting the result to zero gives an optimized weight 
given by g^m oc and normalizing according to our 

constraint yields 


9km 




( 22 ) 


To make intuitive sense of this, let us make a few more 
approximations. The key quan tity here is 9lij^ which 
we can see from Equation ^11[ ) is the overlap integral 
between the effective primar^HDeams of the ith and jth 
fringe-rate bins. In Sectionwe saw that if one takes the 
fringe-rate Fourier transform over a wide enough window 
in time, different fringe-rates map to different portions of 
the sky with relatively little overlap. If this is indeed the 
case, 9^^ vanishes unless i = j. Defining 


Mi = ^ y dQdpAi{r,i^) 


(23) 


we have = SijUi, so our optimal estimator for the 
power spectrum (combining Equations 17, 22, and 23) 
reduces to 

(24) 


P = 


1 /2 ^ 

Suppose we now define Vi to be an optimally 
weighted visibility in fringe-rate space. Transforming 
back to the time domain using Parseval’s theorem, one 
obtains 


P(U,7?) 



X^Y 

SET! 




(25) 


where the optimally filtered visibility in the time domain 
? 7 ; ti) is given by 


i 

L J dndiyAi{r,uf V*. (26) 


This is a rather interesting result, in that the optimal 
power spectrum estimator for a single baseline inter¬ 
ferometer consists of a squared statistic (i.e., one with 
no phase information) integrated in time. This may 
seem counterintuitive, particularly if one is accustomed 
to more conventional techniques where images are formed 
from the visibilities and averaged down before any squar¬ 
ing steps. There, it is crucial to average in time be¬ 
fore squaring, because data from different time steps can 
be sourced by the same Eourier modes on the celestial 
sphere. Integrating before squaring allows information 
from these modes to be coherently averaged together 
(since phase information has yet to be discarded), result¬ 
ing in instrumental noise that integrates down as I /Vi. 
This then becomes a 1/t dependence for the error bars 


on the final (squared) power spectrum results, and is a 
much quicker reduction of instrumental noise than if the 
data had been squared first, which would have resulted 
in a dependence on the power spectrum errors. 

In our derivation, we showed that the optimal power 
spectrum estimator can in fact be obtained by squaring 
before integrating, provided the power spectra formed at 
each time instant are first fringe-rate filtered with weights 

/id ^ i.e., where each fringe-rate is weighted by the RMS 
primary beam within the corresponding constant-fringe- 
rate contour on the sky. Essentially, the pre-processing 
step of fringe-rate filtering (with these specific weights) 
replaces the independent time samples with a set of cor¬ 
related visibilities that have effectively already been co¬ 
herently integrated in time. Note that these weights are 
not generally the same as the ones derived in Section |53| 
for optimal mapmaking, where measurements will essen¬ 
tially be weighted by an additional factor of the primary 
beam in fringe-rate space, rather than by the RMS beam 
weighting suggested here. Put another way, to obtain 
the full power spectrum sensitivity of an interferometer, 
it is insufficient to simply square the Eourier amplitudes 
outputted from a map, even if the mapmaking was op¬ 
timized to minimize the error bars of the map. Eorming 
the power spectrum in such a way would be equivalent to 
restricting gij to a form separable in i and j. This restric 
tion precludes the form for Wij given by Equation (22) 


which minimizes the error bars of the power spectrum. 

Importantly, the result that we have derived here ap¬ 
plies only when one is attempting to measure a power 
spectrum with a single baseline (or multiple baselines 
with the same baseline vector b). This is a reasonable 
limit to work in for arrays such as PAPER, where a large 
fraction of the array’s sensit ivity comes from inst anta- 
neously redundant baselines ( Parsons et al.||2Q12a ). Eor 
arrays that have less instantaneous redundancy, it be¬ 
comes more important to combine data from multiple 
baselines. If multiple baselines are involved. Equation 
(25) no longer reduces to a single sum over the time axis. 
Said differently, it is no longer true that the full sensi¬ 
tivity of an array can be obtained by averaging together 
time-slice-by-time-slice estimates of the power spectrum 
estimation from fringe-rate filtered data. Instead, the 
optimal estimator involves a double sum over time, since 
with multiple baselines of roughly the same length, it is 
possible for baselines to rotate into one another on the uv 
plane. That is, rotation synthesis becomes an important 
contribution to an interferometer’s sensitivity. 

We note that following fringe-rate filtering, the normal¬ 
ization of the power spectrum estimator must be mod¬ 
ified accordingly. This can be seen in Equation ( [2^ , 
where the scalar quantities in front of the sum are oTf- 
ferent than those found in Equation ( p!^ , which is ap¬ 
plicable to non-fringe rate filtered dataTMore generally, 
suppose we consider an auto-correlation-only estimator 
of the form 


p= j2\wiVi\\ 


(27) 


1/2 

where if Wi is set to gd ^ we recover the optimized es¬ 
timator derived above. Keeping Wi arbitrary here will 
be useful in later sections, when we examine estimators 
that are purposely non-optimal as far as instrumental 
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noise sensitivity is concerned, but may provide better 
final results due to a better mitigation of systematics. 
Like before, we may impose the condition that (P) = P, 
which requires that the weights satisfy = 1 

(where we have once again invoked the approximation 
that = 0 if i 7 ^ j). Equivalently, we may leave our 
weights initially unnormalized as one forms the combina¬ 
tion WiVi in Equation ( [^ , instead compensating with a 
normalizing denominator. Writing the estimator in this 
way, one obtains 


P = 




\2kB) B ^ ^ 


where in the last equality we re-inserted the definition 
of S and invoked Parseval’s theorem in the numerator 
to write the fringe-rate filtered visibilities in the time 
domain. Our expression now takes precisely the same 
form as Equation (15), except with fringe-rate filtered 


visibilities instead of the original visibilities, and 

w‘j fij as a normalizing beam area instead of the in¬ 
tegrated square beam ftpp. That the estimator can be 
written in such a similar form is unsurprising, since we 
showed in Equation that fringe-rate filtered visibili¬ 
ties are essentially the same as the original visibilities but 
with modified primary beams. Indeed, the term 
can alternatively be computed by simply evaluating the 
integral for but with the effective primary beam 
of Equation ([^ instead of the original primary beam. 
Explicitly, the relevant integral is , 

which becomes 


^ J dflduAi,{r,u)'^ X -r^ 


(29) 


where we made the approximation that 7 is a reasonably 
compact (delta function-like) function in Equation (^. 
Suppose we now evaluate this integral by splitting the 
sky into rings of constant fringe-rate, i.e., regions within 
which we have ht x • r equal to a constant. By con¬ 
struction, the function w is constant within each of these 
regions, and denoting its value in the jth region as Wj^ 
the integral becomes 


/ dQ-duAj{v,uf = ^ 




(30) 


completin g ou r proof that the normalization factor in 
Equation (28) is simply Vtpp computed with the effective 


primary beam. 

To be conservative, however, it is important to verify 
our analytic results using numerical simulations, since a 
number of approximations were made in our derivations. 
We use simulated observations of individual point sources 
positioned at declinations in 5° increments, traversing 
through the primary beam and fringe pattern as a func¬ 
tion of time. As illustrated in Eigure^ we inject point 
sources of unity flux density and apply the EIR imple¬ 
mentation of fringe-rate filtering described in Section ^ 
to the visibility time-series. The change in amplitude 
of the filtered visibilities at each point along the time 
axis can then be identified with a specific position on the 
sky for that point source. We bin these responses on a 


HEALpix spherical grid (Gorski et al.|2005) to determine 
the effective beam response at that location. We perform 
these simulations using both an isotropic primary beam 
(in Eimre 0 and with the PAPER primary beam (in 
Figure^. Tne effective beam responses that we recover 
are shown in Figure The artifacts that appear every 
5° in declination are a result the binned beam model 
imperfectly interpolating between source tracks at dif¬ 
ferent declinations. Aside from these artifacts, which we 
emphasize are associated with the reconstruction of the 
beam model and not with any features exhibited along 
a source track, these results compare well to the results 
shown in Figure where we show the effective beam 
response by looking up the filter response correspond¬ 
ing to the fringe rate at each sky position, i.e., by using 
Equation 0 . As these simulations illustrate, the ap¬ 
proximations we have made in our analytic derivation 
of the effect of fringe-rate filtering on the effective beam 
response are valid, with errors dominated by the effects 
of binning and interpolating between the 5° intervals in 
declination between point-source simulation tracks when 
reconstructing an effective beam. 

As a final check, we perform simulations assigning fiux 
to pixels on the sky with a Gaussian random distribu¬ 
tion and simulating the visibilities measured as a func¬ 
tion of time for our fiducial baseline. We compute the 
power spectrum amplitude for this simulated signal and 
a fringe-rate filtered version according to Equation (15) 


omitting the primary beam term Vtpp in both cases. We 
then compute the ratio of the unnormalized P{k) mea¬ 
surements before/after fringe-rate filtering and compare 
this ratio to the change in effective beam area associated 
with the chosen fringe-rate filter. Dividing the signal 
loss associated with fringe-rate filtering by the change in 
beam area, we obtain a ratio ofl.016±0.011 (Icr), indi¬ 
cating that these ratios are in agreement within the sam¬ 
ple noise of our simulation. This numerically legitimizes 
the approach advocated above, where we used approxi¬ 
mate analytic arguments to argue that Equation (15) can 
be used for power spectrum estimation provided the ef¬ 
fective beam squared integral is used instead of the beam 
squared integral for Q^pp. 

4.2. Minimizing polarization leakage 

In the previous section, we maximized our sensitivity 
to the power spectrum under the assumption that the 
measurements were limited by instrumental noise. In 
practice, however, there may be other sources of noise or 
systematics that limit our constraints. One example of 
this is the cross-contamination between Stokes terms in 
interferometric polarization measurements. Minimizing 
such contamination is of importance for 21 cm cosmol¬ 
ogy experiments that rely on the spectral axis to probe 
the line-of-sight direction at cosmological distances. For 
these experiments, Faraday rotation combines with a 
spurious coupling between Stokes terms (typically Q to 
I) to produce polarization leakage whose spectral struc- 


signal (iJelic et al. 

2008 

2010 2014 Bernard! et al.|2013t 

IVIoore et al.|2U13[ 

2015 

). Current interferometers target- 


MWA, PAPER, HERA, CHIME, LEDA) all employ lin¬ 
early polarized feeds, primarily because of their ease of 
construction and ability to co-locate elements sensitive to 
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Fig. 4. — The real component of the fringe amplitude simu¬ 
lated for our fiducial baseline deployed at a latitude of —30° before 
(black) and after (red) the application of the fringe-rate filter de¬ 
scribed in Section From top to bottom, the panels illustrate 
fringes for point sources passing through the fringe pattern at de¬ 
clinations of 0°, —30°, and —60°, respectively. In this simulation, 
antenna elements have isotropic primary beams. 



Hour Angle (radians) 


Fig. 5. — Same as Fi gurebut for antenna elements with the 
PAPER primary beam jParsons et al.|2010[ |Pober et al.|2012| ). 


orthogonal polarizations. However, orthogonal linearly 
polarized feeds in practice have pri mary beam responses 
that do not match. As described in|Moore et al. 1(2013) 


and Jelic et al. (2010), if left uncorrected, the unniatched 
beam response between visibilities Vxx and Vyy measur¬ 
ing the XX and YY polarization products, respectively, is 
the dominant source of polarization leakage in the Stokes 
I measurement Vi = {Vxx-\-Vyy)/‘^ for linearly polarized 
feeds. 

With an accurate beam model, it is trivial to rescale 
Vxx and VVy so that the XX and YY beam responses 
match in one particular direction (typically the zenith). 


Their sum, Td, then represents a perfect probe of the 
Stokes I parameter in that chosen direction, but will 
contain contamination from Vq = (Vxx — kYY)/2 in 
directions where the XX and YY beam responses do 
not match. More precisely, suppose the XX polariza¬ 
tion product has an antenna power beam of Axx, while 
the YY polarization product has an antenna power beam 
of Ayy- The primary beam that we have denoted A(f) 
in previous sections is given by {Axx + ^yy)/ 2, which 
will be labeled as A^(r) in this section to emphasize its 
meaning as the antenna response to the Stokes I sky. 
Defining A^{t) = {Axx — Ayy)/2, the Stok es I visibility 
takes the form (see, e.g., [Moore et al.||2Q15 ) 


Vbu{t) = j /zy(r)A^(r, t) exp — i27r-b(t) • r 


J dnQ^{r)A^{r,t) 


exp 


—i27T—h{t) • f , (31) 
c 


where Qj^ denotes the Stokes Q sky at frequency u. The 
second term represents the Q to I leakage in the visibil¬ 
ity measurement, which does not vanish if Axx 7^ Ayy, 
i.e., if there are any asymmetries in the beams. For a 
single baseline at a given instant in time, this leakage 
term cannot be eliminated. The heart of the problem 
is the impossibility of creating a match between a pair 
of two-dimensional functions (the XX and YY beam re¬ 
sponses) with a single degree of freedom (the amplitude 
of Vxx relative to VVy)- In order to improve the match 
between polarization beams in interferometric measure¬ 
ments, many interferometric measurements from distinct 
points in the uv plane will have to be combined with ap¬ 
propriate weights to effect a re-weighting of the sky along 
two dimensions. 

A commonly used technique for correcting the mis¬ 
match between the XX and YY polarization beams is 
to separately image these polarization products, correct 
each pixel in each image using modeled beam responses, 
and then to sum the corrected images together to form 
a Stokes I map (e.g., [Sullivan et al.||2Q12 Bernardi et al. 
2013 Asad et al.[[2015| . IVlathematically, this technique 
is identical to convolving the sampled uv plane by the 
Fourier transform of the directionally-dependent correc¬ 
tion applied in the image domain. For an ideal array that 
samples the uv plane at scales significantly finer than the 
aperture of a single element, this technique can in prin¬ 
ciple perfectly correct mismatches between the XX and 
YY polarization beams. In practice, the success that can 
be achieved with this technique depends strongly on an 
array’s uv sampling pattern. 

Take, for example, the case of a sparsely sampled uv 
plane where the spacing between uv samples is much 
greater than the aperture scale of a single element. In 
this case, the beam correction described above convolves 
each uv sample with a kernel whose size scales roughly 
as the size of the aperture of a single element in wave¬ 
lengths. Since this kernel is much smaller than the spac¬ 
ing between uv samples, each point in the convolved uv 
plane is dominated by the product of a kernel weight and 
a single visibility measurement. As such, for a chosen uv 
coordinate, the level of leakage in the Stokes I uv plane 
effectively reduces to what can be achieved using a single 
number to rescale Vxx and Vyy before summing. 

In image domain, the baseline fringe pattern has been 
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Fi g. 6. — The effective primary beam response of a baseline, as reconstructed from point-source simulations described at the end of Section 
|4.1| and illustrated in Figures^ andThe left panel shows that PAPER’S model beam response is recovered from unfiltered visibilities; the 
center panel illustrates the beam weighting that results from applying a fringe-rate filter tuned to optimize sensitivity for power spectrum 
measurements, assuming an isotropic primary beam; the right panel shows the effective response after applying this fringe-rate filter to 
data including PAPER’S model beam response. The periodic structure along the vertical axis is an artifact of reconstructing beams from 
point sources spaced every 5° in declination; we emphasize that they are not associated with any fundamental structure associated with 
fringe-rate filtering. 



Fig. 7. — The model primary beam response of a baseline, simulated analytically usi^ the interpretation of fringe-rate filtering as a 
spatial filter acting along fringe-rate contours. Panels follow the same order as in Figure 


integrated across the sky weighted by two different (XX, 
YY) beam responses to produce a visibility. This means 
that an interferometric baseline for these two polariza¬ 
tions samples the uv plane convolved by two different 
kernels — the Fourier transforms of the XX, YY beams, 
respectively. Because the resulting Yxx and Vyy visibili¬ 
ties no longer retain any direction-dependent information 
(they were integrated over angle) kxx and VVy cannot 
be re-weighted in a way that produces pure Stokes I re¬ 
sponse in all directions, except in the trivial case where 
the YY beam differs from the XX beam by a direction- 
independent multiplicative factor. Said another way, 
given a single degree of freedom in the relative weighting 
of kxx and Vyy, h is not possible to guarantee a pure 
Stokes I response in more than one direction. 

As mentioned above, it is common practice in synthesis 
imaging to grid visibilities in the uv plane with convolv¬ 
ing kernels that contain direction-depend ent information 


Exam ples include W-projection kernels (Cornwell et al 


2003), the Fourier tra nsform of the prima^ beam used in 


optimal map m aking ( [Morales fc Matej^ 2009b 


Bhatna- 


gar et al. 2008), and (relevant here) the Fburier transform 


of the inverse of direct ion-dependent Mueller matrices 
(Tasse et al. 2013). Indeed, in the case of a completely 
sampled uv plane, the inverse Mueller beam kernel would 
perfectly match the XX and YY beams. However, as 


the sampling of the uv plane becomes sparser, the re¬ 
moval of samples removes degrees of freedom in weight¬ 
ing as a function of direction. In the limiting case of 
a very sparse array where the convolving kernels of uv 
samples do not overlap, we revert to the single direction- 
independent weighting factor above. 

Another way of thinking about this is to realize that, 
by gridding visibilities in the uv plane with a convolv¬ 
ing kernel, we are attempting to reassign sky intensity 
from the visibility, which was integrated over in equa¬ 
tion back to its original location so that it may be 
re-weighted to match the XX and YY polarization re¬ 
sponses. This inversion is only well posed in the case of 
a completely sampled uv plane. As we lose samples, we 
lose the ability to reconstruct the original sky intensity, 
and hence, to perfectly match the XX and YY polariza¬ 
tion responses as a function of direction. In the case of 
a sparsely sampled uv plane, it is only possible to con¬ 
struct a pure Stokes I response as a function of direction 
if one has sufficient additional information about the sky 
(for example, if it sparsely populated by point sources) 
that one can reconstruct the missing modes and hence, 
fully invert the uv plane. 

For cases where uv sampling falls somewhere between 
the sparse and the oversampled cases described above, 
the level of primary beam correction that can be realized 





































Fringe-Rate Filtering 


11 



Fig. 8. — Beam response patterns illustrating the application of fringe-rate filtering to minimizing polarization leakage. Panels from 
left to right illustrate the response before matching XX and YY polarization beams, the response after such matching, the subsequent 
application of an optimal power-spectrum sensitivity filter, and the application of a filter optimizing both sensitivity and polarization 
match. The top row depicts the Stokes I beam response in logarithmic units; the bottom row shows the polarization match (XX — YY)^ 
in linear units. The fringe-rate filters optimizing sensitivity and polarization match corre spond to the red and green curves in Figure!^ 
respectively. The third column most closely corresponds to the fringe-rate filter applied in |Ah et al.| ( |2Q1^ . 


is more complicated. Ultimately, the Fourier relationship 
between the uv plane and the image dictates that sam¬ 
ples that are nearby to one another in the uv plane enable 
primary beam corrections on the largest angular scales, 
while samples that are farther apart contribute to correc¬ 
tions on finer angular scales, with the orientation of the 
samples relative to one another dictating the axis along 
which such corrections take effect in image domain. Typ¬ 
ically, earth-rotation synthesis is required to sample the 
uv plane densely enough to allow for effective beam cor¬ 
rection for diffuse structur^ configurations are not dense 
enough to fully correct the beam even then. One partic¬ 
ularly relevant case that falls in this last category are 
many of the maximum redundancy configurations cur¬ 
rently favored by several 21 cm cosmology experiment s 
for their sensitivity benefits (Parsons et al |2012a 2014). 

However, even in the single-baseline case, earth- 
rotation synthesis provides dense uv sampling along one 
direction: the direction the baseline traverses in the uv 
plane. The appropriate convolution kernel can combine 
samples along this track so as to correct the primary 
beam mismatch along one axis. Of course, what we 
have just described—a convolution kernel acting along a 
time series of samples from a single baseline—is precisely 
fringe-rate filtering. Put another way, since fringe-rate 
filtering has the effect of modifying one’s primary beams, 
it is possible to tailor fringe-rates to improve the match 
between the XX and YY polarization beams. In the case 
of sparse array sampling, fringe-rate filtering reproduces 
what can be achieved by independently imaging the po- 


® As mentioned earlier in this section, it is possible to correct 
for direction-dependent polarization beams toward sparse point 
sources using more widely separated visibility samples. It is diffuse 
structure (which is more localized in the uv plane) that poses the 
greatest challenge. 


larization products. While this is not as effective at mit¬ 
igating polarization leakage as can be achieved through 
imaging in the dense sampling case, we will now show 
that it nonetheless represents a substantial improvement 
over the naive summing of XX and YY visibility mea¬ 
surements. 

Suppose for every frequency channel in our observa¬ 
tions, we compute the RMS ^xx(r) along the spatial 
ring corresponding to each fringe-rate bin, using the re¬ 
lation / V'hxiv^u/c derived in Section^ The result is 
a one-dimensional beam profile in fringe-rate space. Re¬ 
peating the same exercise for Hyy, one can then form the 
ratio between the two profiles, quantifying the mismatch 
between the XX and YY beams in a one-dimensional 
projection. This ratio is plotted for PAPER as the blue 
dashed curve in the top panel of FigureIf one then uses 
this curve as a set of fringe-rate weights for the VVy, the 
resulting effective beams will be more closely matched to 
one another, which we will demonstrate later in this sec¬ 
tion when we quantitatively estimate polarization leak¬ 
age for various fringe-rate filtering schemes. Note that 
we use the RMS rather than the beam itself because ul¬ 
timately we seek a measurement of the power spectrum, 
which is a squared quantity. 

In the second column of Figure we show the effective 
beam (top row) and the beam mismatch (i.e., A^(f); bot¬ 
tom row) after performing our beam matching procedure. 
One sees that the mismatch is slightly mitigated com¬ 
pared to the original unweighted case (leftmost column). 
However, substantial power remains in the mismatched 
beam A^{r) because fringe-rate filtering can only alter 
beam shapes in a one-dimensional family of fringe-rate 
rings (as we discussed in Section]^. A perfect matching 
would require a set of two-dimensional weights. 

Interestingly, the beam matching can be improved 
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with one further weighting that is not specifically tar¬ 
geted at polarization matching, namely the sensitivity- 
optimized weighting pr oposed in Section |4.1[ These were 
the weights applied in Ah et al. (2015), and the results 
of applying these weights to combined Vj visibilities fol¬ 
lowing the beam-matching procedure are shown in the 
third column of Figure While this extra weighting 
step was not designed to deal with polarization in mind, 
it is still effective at mitigating leakage because it down¬ 
weights portions of the sky that have some of the worst 
beam mismatches. 

Finally, one may use knowledge of the mismatched 
beam A^(r) to further inform one’s weighting. In par¬ 
ticular, one may take the scheme that we have outlined 
so far (applying fringe-rate filters to first match XX and 
YY beams, then to maximize sensitivity), and add one fi¬ 
nal filter that downweights portions of the sky where the 
mismatched beam is known to be large. In the fourth col¬ 
umn of Figure we additionally weight each fringe-rate 
by the ratio of the RMS A^(r) fringe-rate beam profile to 
the RMS A^{y) fringe-rate beam profile. The effective 
fringe-rate weights that are applied to the Stokes I vis¬ 
ibilities are given by the green curve of Figure From 
Figure 1^ we see that this hybrid sensitivity- and%akage- 
optimized weighting significantly reduces the beam mis¬ 
match. 

In minimizing polarization leakage, one must be care¬ 
ful not to enact an overly aggressive scheme that narrows 
the effective beam to such an extent that there is a sub¬ 
stantial loss of power spectrum sensitivity. To quantify 
this trade-off, consider a generalization of Equation ^ 
to include polarization leaka ge. Performing a derivation 
similar to the one in Section [TT] yields 


{Viin,rj)V^in,rjr) ^ S 


Pf(k)n(/ + PQ(k)n«« 


(32) 


where S' = {B/X^Y){2kB/X^)^ and {Xk^, Xky,Yk,) = 
27r(i4,'T’, 77 ), as before. The indices i and j again index 
fringe-rate bins. Beam integrals and are de¬ 

fined analogously to Equation 0 , but with superscripts 
indicating the Stokes components. The Stokes I power 
spectrum Pj is the same as the power spectrum P de¬ 
fined in Section [4T| with the new notation serving only 
to distinguish it from the Stokes Q power Pq. In deriv¬ 
ing Equation (32), we made two key assumptions. The 
first is that the I and Q contributions to the sky are 


on average uncorrelated. Empirically, this appears to 
be the case dWiering a et al. 1993 Gaensler et al. |200I 
Bernard! et al. ||20U3| ) and physically, one expects this tc 


to 

be so since foreground interstellar medium clouds affect 
polarized and unpolarized emission differently. The sec¬ 
ond assumption is that the polarized sky is describable 
by a power spectrum. At some level, one expects this as¬ 
sumption to fail, as there exist bright polarized sources 
that are not accounted for in a random realization of 
some power spectrum. However, since our ultimate goal 
is to measure power spectra, it is reasonable to define an 
effective Stokes Q power spectrum. 

Like before, rather than considering individual corre¬ 
lations between different fringe-rate bins, one may insert 
fringe-rate filtered delay-space visibilities on the left- 
hand side of Equation (32). The same equation then 
holds with the fringe-rateoin indices omitted (since the 


fringe-rates have already been combined in a weighted 
combination) and and replaced by the square 
integral of the effective beams, which we denote and 

respectively. One sees then that a suitable estimator 
of the Stokes I power spectrum is 


^/(k) = 


\v 


(33) 


Ensemble-averaging both sides and inserting the fringe- 
rate filtered version of Equation (32) yields 


a 


QQ 

eff 


(P,(k)) = PKk) + ^PQ(k). 




(34) 


The second term is the polarization leakage in our final 
power spectrum estimate. The key quantity is 
which quantifies the effectiveness of one’s polarization 
leakage suppression scheme. On the other hand, if the 
fringe-rate filtered visibilities have an instrumental noise 
variance of (as we assumed in the previous section), 

the instrumental noise errors APj in our power spectrum 
estimator are 


AP,(k) = (Var [-P/(k)]) 


^filt 

sni^- 


(35) 


Eor instrumental noise sensitivity, then, the crucial quan¬ 
tity is In Table Ih we list the value of this 

metric (rightmost column) tor the various weighting 
schemes considered in this section, normalized to the 
value obtained with no fringe-rate filtering. We also 
show the integrated power beam J A®^(f)(ifl, the square- 
integrated power beam = f the effec¬ 

tive (noise equivalent) integration time implied by each 
fringe-rate filter, and the fractional power spectrum leak- 
age 

At this point, we should emphasize that because fringe- 
rate filtering performs a weighted combination of sam¬ 
ples over a wide time interval, the instrumental noise 
errors APj are substantially correlated between nearby 
times samples. The number of independent modes pre¬ 
served for power spectrum analysis corresponds to the 
number and weighting of the filtered fringe-rate modes. 
Even without filtering, quantifying the number of sam¬ 
pled modes is non-trivial, as the time dependence of a 
baselines projection (and hence, uv mode sampling) to¬ 
ward a patch of sky varies substantially over the wide 
fields of view of low-frequenc y interferometers. As dis¬ 
cussed in Parsons et al. (2014), to accurately account for 
the statistical interdependence of these modes in either 
the fringe-rate filtering or square time-domain integra¬ 
tion case, it is necessary to use bootstrapping to deter¬ 
mine errorbars. 

With only a polarization-matching filter applied 
weighting the YY polarization beam to best match the 
XX polarization beam (second row of Table Q second 
column of Eigurej^, we see that the fractionalpolariza- 
tion leakage is reduced from 1.64 x 10“^ to 1.26 x 10“^. 
Applyi ng a n additional sensitivity weighting step from 
Section [4T] (third row of Table third column of Eigure 
further reduces the fractionm polarization leakage to 
0.88 X 10“^, and though there is a 25% reduction in effec- 
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TABLE 1 

Sensitivity and polarization leakage metrics for different fringe-rate filters. 



Integrated 
power beam [sr] 

Square-integrated 
power beam [sr] 

Effective 

integration time [s] 

Polarization leakage 

Normalized power 
spectrum sensitivity 

No fringe-rate filter 

0.74 

0.32 

420 

1.64 X 10-3 

1. 

Polarization-matched fringe-rate filter 

0.74 

0.32 

420 

1.26 X 10-3 

1. 

Sensitivity-optimized fringe-rate filter 

0.51 

0.24 

2930 

0.88 X 10-3 

1.9 

Polarization and sensitivity-optimized 

0.36 

0.15 

5570 

0.68 X 10-3 

1.7 


tive beam area, the power spectrum sensitivity is boosted 
by almost a factor of two. This improvement takes into 
account the improvement in sensitivity of each indepen¬ 
dent power-spectrum sample, but also the correspond¬ 
ing decrease in the number of independent time samples 
available. (Recall that fringe-rate filtering effectively av¬ 
erages together nearby time samples, and thus returns a 
time series where instrumental noise is no longer inde¬ 
pendent from time slice to time slice). Finally, applying 
another filter to minimize leakage (fourth row of Table 

fourth column of Figure]^ green curve of Figure]^, 
results in another ^ 25% decrease in polarization leakage 
with only a ^10% degradation in sensitivity. This degra¬ 
dation is associated with the longer integration times re¬ 
quired to produce the narrow fringe-rate weighting profile 
needed for polarization matching. This results in sensi¬ 
tive regions of the beam being underweighted because of 
their leakage. 

Our final, best-performing weighting scheme for mini¬ 
mizing polarization was in some sense a rather arbitrary 
weighting. This is unavoidable, since the power spec¬ 
trum of low-frequency polarized emission is unknown at 
fine angular scales. We now show how our final weight¬ 
ing step may be optimized if the polarized power spec¬ 
trum is known. Suppose we estimate the power spec¬ 
trum by forming weighted averages of the visibility cross¬ 
correlations between all pairs of fringe-rate bins. The 
general form for such an estimator is gi ven b y Equa¬ 
tion (17). However, as we found in Section [tH one may 
approximate the different fringe-rate bins as being un¬ 
correlated. The same approximation can be made here, 
allowing us to define and by letting = Sijjj^l 

and = Sijjj^f. With no correlations between fringe- 
rate bins, it becomes sufficient to once again use Eoua- 
tion (27). Taking the ensemble average of Equation ( |27[ ) 
but this time inclu ding polarization leakage terms, i.e., 
inserting Equation (32) rather than Equation gives 


T) = FE + F E Pq- (36) 


If, as before, we require our weights to be normalized 
so that = 1, the first term gives an unbiased 

estimator of the Stokes I power spectrum. The second 
term is the bias in our final power spectrum estimate due 
to leakage from Q to L Ideally, the weights are chosen 
to mitigate this contribution. However, in attempting 
to minimize this systematic, one must be careful not to 
pick weights that dramatically amplify the instrumental 
noise contribution. We therefore choose to minimize an 
overall “variance” that is the noise variance of Equation 
(19) plus the square of the Stokes Q power spectrum bias 
(Essentially adding the error bars from instrumental noise 


and polarization leakage in quadrature). In other words, 
we minimize 

C = • m + {SPq)‘^ (m • — Am • , (37) 

where we have introduced a Lagrange multiplier A to 
enforce our normalization constraint, and have grouped 
the different fringe-rate weights into a vector m = 
(ref, re 2 ,...), with the beam overlap integrals similarly 
grouped into vectors and fji^. Now, notice that if we 
define 

H = (38) 

our expression can be written as 

C = m^Hm — (39) 

Differentiating this, setting the result to zero, and solving 
for the normalized weights gives 




(40) 


This expression involves H and can be evaluated ex¬ 
plicitly in our case using the Woodbury formula, giving 


p2^'3(M«) 

+ P^^lQ ■ 


where P/v is the noise power spectrum, which is equal 
to the RMS visibility noise divided by by S. If the sky 
were completely unpolarized (Pg = 0) or there were no 
polarization leakage due to mismatched beams = 0), 
then (x I, and the optimal weighting would take the 
form m oc . Recalling that each component of a 
vector corresponds to a different fringe rate, we see that 
m 


(X jjL ^ is precisely the result derived in Section 


4.1 


where each fringe-rate bin was weighted by the integratec. 
square of the beam within the bin. 

With polarization leakage. Equation (40) shows that 
as a first step, one should still weight oy the square- 
integrated beam within each fringe-rate bin. However, 
one should then further weight by the more complicated 
form for given by Equation (41). To gain some 

intuition for this operation, consider the limit of zero 
instrumental noise. One then obtains 


H 


Pn=0 


(Xl- • 11^) (42) 


which we immediately recognize as a projection matrix 
that projects out the vector . Since the components 
of 11 ^ encode the polarization leakage response in various 
fringe-rate bins, projects out linear combinations of 
fringe rates that are indicative of Q to I leakage due to 
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beam asymmetries. At the other extreme, if the polar¬ 
ization power spectrum is weak compared to the noise 
power spectrum, we have 


H 


-1 


Pn^Pq 


cxI- 


Pq 

Pn 


(43) 


which is similar to a projection operator, but with the un¬ 
wanted or projected piece tempered by the square of the 
small parameter Pq/P/v- Intuitively, even modes that 
are contaminated by polarization leakage contain cosmo¬ 
logical information, and it is advantageous to avoid too 
drastic a projection if possible. An overly aggressive sub¬ 
traction of leakage modes results in a loss of cosmological 
signal, which when corrected for by the altered normal¬ 
ization of the power spectrum, results in a magnification 
of the error bars. For the general case of intermediate 
noise levels, the matrix smoothly interpolates be¬ 
tween the two extremes. 

While mathematically optimal, the fringe-rate weight¬ 
ing proposed here is currently difficult to put into prac¬ 
tice. This is because the Q power spectrum Pq has yet to 
be positively measured at low frequencies. For example, 
recent measurements in 
upper limits on Pq, at 


Moore et al. (2015) provide only 
east at the ^-scales that are the 
most promising for a first detection of the 21cm signal. 



Frequency [MHz] 


4.3. Minimizing Instrumental Systematics and 
Off-Axis Foregrounds 

A final application of beam sculpting with fringe-rate 
filters targets the suppression of systematics in data. We 
will consider two systematics: additive phase terms as¬ 
sociated with instrumental crosstalk, and sidelobes as¬ 
sociated with celestial emission outside of the primary 
field of interest. Both of these applications are closely 
aligned with the original application of fr inge-rate fil¬ 
ters described in Parsons & Backer (2009). Reduction 
of sidelobe contamination (as well as radio frequency in¬ 
terference mitigation) using a sini ilar technique was also 
discussed in Offringa et al. (2012). 

For the purposes of this discussion, we consider in¬ 
strumental crosstalk to be a spurious correlation intro¬ 
duced between otherwise uncorrelated signals as a result 
of electromagnetic coupling in the instrument (typically 
between adjacent, unshielded signal lines) or because a 
non-celestial source has injected a correlated signal (e.g. 
switching noise on power supplies). Alth ough crosst alk 
can be suppressed using phase switching (Ryle||1952|), it 
is always present at some level in interferometric obser- 
vations. If it is temporally stable, however, it is possible 
to significantly suppress crosstalk in data by averaging 
visibilities over a long period (so that the fringing celes¬ 
tial signal washes out) and then subtracting the average 
complex additive offset from the data. This tech nique 
has long been applied to, e.g., PAPER observations ([Par¬ 
sons et al.||201Q| |Pober et al.||2Q131 Jacobs et al. 2U14[ 


^arsons 


et al. 


Ali et al.||201^. 


As a time-domain filter, this crosstalk removal tech¬ 
nique can also naturally be understood as a notch fil¬ 
ter for removing signals with zero fringe-rate. Because 
crosstalk removal uses a finite time interval for comput¬ 
ing the average, applying this notch fringe-rate filter has 
the effect of removing emission from the region of sky 
corresponding to the zero fringe-rate bin. As illustrated 


Fig. 9. — Top: A proje ction of the global sky model of |de| 
jOliveira-Costa et al.j (|2008|) as viewed from latitude —30° at a 
local sidereal time of 1)5:44, overlaid with the PAPER beam re¬ 
sponse contours before (black) and after (red) the application of 
a fringe-rate filter optimized for sensitivity. Bottom: a simulation 
of the corresponding RMS visibilities for a 30-m east-west baseline 
before (black) and after (red) the same fringe-rate filtering. The 
reduced presence of foregrounds in the simulated visibilities after 
fringe-rate filtering can be understood as coming from the spatial 
filter enacted by fringe-rate filtering. 


in Figure for our fiducial baseline, this corresponds to 
the unshaded region intersecting the south celestial pole. 
For PAPER, this region is sufficiently low in the beam 
that its removal has little impact, but in general, sub¬ 
sequent analysis of crosstalk-removed data may require 
accounting for the beam-sculpting effects of the crosstalk 
removal filter. 

Thus, when considering instrumental systematics, 
there may be additional criteria that influence one’s 
choice of fringe-rate filter besides optimizing signal-to- 
noise; one may choose to excise the zero fringe-rate bin 
to improve data quality at a very modest cost to sen¬ 
sitivity. Similarly, it is common to encounter situations 
where celestial emission that is low in the primary beam 
is bright enough to introduce undesirable sidelobe struc¬ 
ture or other systematics in observations targeting an 
area nearer to beam center. In this case, one may again 
find it desirable to depart from optimal weighting de¬ 
rived in Section |4.1| by further down-weighting regions 
of low sensitivity in order to gain improvements in fore¬ 
ground systematics. This application of fringe-rate fil¬ 
tering is particularly relevant for 21cm cosmology experi¬ 
ments where approximately Gaussian signals are overlaid 
with highly non-Gaussian foregrounds. Fringe-rate filters 
that are informed by the angular structure in foreground 
models can substantially suppress foreground systemat¬ 
ics while having little impact on a statistically isotropic 
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Gaussian signal. 

In Figure we illustrate this technique using simula¬ 
tions of a singl e baseline. The sky model that we em¬ 
ploy is that of |de Oliveira-Costa et al.| (|2008|) , and in 
the top panel of the hgure we show the field of view of 
our simulated interferometer at a particular instant in 
time. Primary beam contours for the PAPER instru¬ 
ment are shown in dashed black, and one sees that there 
is substantial power from bright Galactic plane emission. 
In solid red are the contours for the effective primary 
beam following the application of a p owe r-spectrum op¬ 
timized fringe-rate filter (see Section 4.1). The reduced 
effective beam size means that less of the Galactic plane 
contributes to the measured power. This can be seen in 
the bottom panel of Figure where we show the cor¬ 
responding RMS visibilities. The red curve is clearly 
reduced compared to the black curve, indicating reduced 
foreground contamination. If desired, the contamination 
can be further mitigated by picking a different set of 
fringe-rate weights that downweight the edges of the pri¬ 
mary beam even more. However, this extra foreground 
suppression must be carefully balanced with the increase 
in instrumental noise error bars that inevitably results 
from the narrowing of the effective beam beyo nd th e size 
implied by the optimized procedure of Section [tT| 


5. FRINGE-RATE EILTERING AS MAPMAKING 
EROM TIME-ORDERED DATA 

In the previous sections, we have focused on applica¬ 
tions of fringe-rate filtering that operate on a single base¬ 
line basis. These applications are particularly powerful 
for maximally redundant arrays such at PAPER, which 
have most of their sensitivity concentrated in multiple 
identical copies of a small handful of baseline types. By 
design, maximally redundant arrays are not optimized 
for imaging, which instead require arrays that sample a 
large number of unique baselines. In this section, we 
turn our attention to such arrays, tackling the imaging 
(i.e., mapmaking) problem for multi-baseline arrays. We 
will find once again that the fringe-rate space is partic¬ 
ularly well-suited for implementing time integration for 
interferometric data. 

Suppose our time-ordered visibilities are grouped into 
a measurement vector v of length where is the 

number of baselines, and Nt is the number of snapshots 
taken in time. If we represent the true sky as a vector x 
of length A^pix, and our instrument’s response as a matrix 
A of size Ni,Nt x Apix, the measurement equation is given 
by 


V = Ax + n. 


(44) 


Matejek||2QQ9a Dillon et al.||2QI5 ) 

x = MA’^N-^ 


(45) 


where M is some invertible matrix chosen by the data 
analyst, the dagger signifies an adjoint, and N is the 
noise covariance matrix, defined as (nn’l'), with angled 
brackets denoting an ensemble average. Again, our vec¬ 
tor/matrix expressions are basis-independent, so even 
though the formation of x is often described as “map¬ 
making”, it need not correspond to spatial imaging in 
the traditional sense of the word. A similar approach 
can also be found in “A-projection” alg orithms (jBhatna- 
gar et al.]|2QQ8 , 2013 Tasse et M^|2QI3 ). 


4'he error bars on the estimator x are obtained by com¬ 
puting the square root of the diagonal elements of the 
covariance 5], which is given by 


S = ((x - x)(x - x)1') = MA^’N-^AM^ 


(46) 


With a suitable choice of M, the estimator given by 
Equation (|4^ minimizes the variance. Regardless of 


one’s choic e, however, Eq uation (45) can be shown to 
be lossless (|Tegmark|I997| , in the sense that any quanti¬ 
ties (such as power spectra) formed further downstream 
in one’s analysis will have identically small error bars 
whether one forms these data products from x or chooses 
to work with the larger and more cumbersome set of orig¬ 
inal data V. 


In principle. Equation (45) is all that is needed to op¬ 
timally estimate the true sky. One simply forms the rel¬ 
evant matrices and performs the requisite matrix inver¬ 
sions and multiplications. However, this is computation¬ 
ally infeasible in practice, given that modern-day inter¬ 
ferometers are comprised of a large number of baselines 
operating over long integration times, resulting in rather 
large matric e s. Thi s is what motivated the authors of 
Shaw et al. (|20I3b) to propose their m-mode formal¬ 


ism, essentially rendering many of the relevant matrices 
sparse, making them computationally easy to manipu¬ 
late. While the m-mode formalism is a general frame¬ 
work that can be used to solve a variety of problems 
(such as mitigating foreground contamination), our goal 
here is to develop similarly convenient techniques for the 
mapmaking problem (i.e., the formation of x), with much 
detail devoted to the intuition behind how our optimal 
estimator operates for an interferometer. 

5.1. The general sub-optimality of time integration 

We begin by showing that it is suboptimal to make 
maps by integrating visibilities in time. Writing out 
Equation 0 for the visibility Vbu{t) with an explicit co¬ 
ordinate system, we have 


where n is a noise vector. Note that in this general form. 
Equation (44) is not basis-specific. Eor example, while it 
is often useml to think of x as a vector containing a list 
of temperatures in a set of pixels on the sky (hence the 
variable name A^pix), h is equally valid to employ another 
basis, such as spherical harmonics. Similarly, while we 
call V the time-ordered data, it need not be a time series, 
and in fact, we will see that a description in fringe-rate 
space is in fact quite natural. 

Given our measurement v, the optimal estimator x of 


the true sky x is given by (Tegmark 1997 Morales & 


Vbu{t) = j A^{r,t)Ii,{r)exp 


—TI'k ( ^ COS f] sin 6 


X exp 


^0 


—i27r ( — cos 6 sin(a — uj^t) 


dTt -|- n(t), (47) 


where n{t) is the instrumental noise, a and 5 are the 
right ascension and declination, respectively, r] is the ge¬ 
ographic latitude of the array, and bo = yo^+^^^in^, 

where bx and by are the east-west and north-south base¬ 
line lengths, respectively. We have chosen our definition 
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of t = 0 to conveniently absorb an arbitrary constant 
phase. Like before, we are assuming that the primary 
beam is fixed with respect to local coordinates and trans¬ 
lates azimuthally on the celestial sphere. We additionally 
assume that the baseline is phased to zenith. In other 
words. Equation (47) describes an interferometer observ¬ 


ing in a drift-scan mode. 

To see how integrating in time may be suboptimal, 
consider a simplified, purely pedagogical thought ex¬ 
periment where our interferometer consists of a single 
east-west baseline {by = 0) situated at the equator 
(?7 = 0). For the primary beam, suppose we have a beam 
that is extremely narrow in the polar direction, so that 
Ajy{T,t) = 6^{6)A^{a — where 6^ signifies a Dirac 

delta function. Plugging these into restrictions into our 
equation, we obtain 

Vbi^it) = j {a- (<5 = 0, a) 


X exp 


-i27r-^ sm{a — oo^t) 

A 


da + n{t). (48) 


For a single baseline, the function Vbu{t) is precisely the 
continuous version of the discrete data vector v. To ob¬ 
tain V, then, one would simply sample Vbi^{t) discretely 
in time. For a multi-baseline array, forming v involves 
following the above procedure for each baseline, and then 
concatenating the resulting vectors to form a single long 
V vector. To make our analytic manipulations more con¬ 
venient, however, we will keep t a continuous variable, so 
that V is a hybrid quantity, discrete in baseline but con¬ 
tinuous in time. Acting on v by a matrix then involves 
summing over baselines and integrating over time. 

Identifying n{t) and li^{0 = 7rl2,ip) as the continu¬ 
ous versions of n and x respectively, the rest of Equation 
(48 )’s integrand can be interpreted as the continuous ver¬ 
sion of A. We can model the noise covariance between 
baselines b and b' at times t and t' as 


Nhb'{tA') = cr Sbb'S^{t - t'). 


(49) 


where a is an RMS noise level assumed to be uncorrelated 
in time and uncorrelated between baselines. 


To see how the optimal prescription of Equation (45) 
combines information from different times, we need only 
evaluate A’t'N“^v, for the M matrix has no time index, 
so its application has no impact on how time-ordered 
data is combined. In our toy model, we have 


to express visibilities in fringe-rate space, and then to 
weight different fringe-rates appropriately before sum¬ 
ming. We will devel op t his method for mapmaking more 
generally in Section [O] 


5.2. The special case where integrating in time is 
optimal 

Before proceeding, it is instructive to establish the 
special case where time integration is the optimal tech¬ 
nique for an initial data reduction step in mapmak¬ 
ing, since it is frequently employed in the literature. 
An inspection of Equation (50) shows that were it not 
for the time-dependence in me primary beam and the 
time-dependence of the sky moving through a baseline’s 
fringes, the optimal recipe would indeed reduce to an in¬ 
tegration of visibilities in time. Finding the limit where 
time integration is optimal is then equivalent to finding a 
special case where the aforementioned time-dependences 
vanish. 

Recall that in our previous example, the primary 
beam had a time-dependence only because our thought- 
experiment consisted of a drift-scan telescope, whose 
measurement equation was written in coordinates fixed 
to the celestial sphere. Instead of this, suppose one had 
a narrow primary beam that tracked a small patch of the 
sky. The primary beam would then have a fixed shape 
in celestial coordinates, and A,y(r, t) would simply be¬ 


come Ajy{r) in Equation (47). To attempt to nullify the 
time-dependence of fringes sweeping across the celestial 
sphere, one may phase the visibilities in a time-dependent 
way, essentially tracking the center of the patch as it 
moves across the sky. Putting this all together and as¬ 
suming that the primary beam is narrow enough to jus¬ 
tify a flat-sky approximation, the measurement equation 
becomes 


Vbuit) = j A^(f)J^(f) 


X exp 


exp 


-i27r ( ^ sm{a — oJ^t) 


—i27T ( cos T] sin 6 ) + i'lpit) 


dTt + n(t), (51) 


where we have assumed for simplicity that the center of 
our small field is directly above the equator, and that a 
time-dependent phase '0(t) has been applied. With this, 
the optimal combination of time-ordered data becomes 


(AtN-^v)^ = ^ A»{a-co^t) 

b ^ 

(50) 

where the a variable serves as the continuous version 
of a discrete vector index. This expression shows that 
the optimal, minimum variance prescription does not call 
for the integration of visibilities in time. Instead, our 
expression calls for the convolution of the visibility data 
with a kernel that is specified by the primary beam shape 
and the baseline. 

Now, recall from our application of the convolution the¬ 
orem in previous sections that for interferometric data, 
convolution in time is equivalent to multiplication in 
fringe-rate space. Equation ( [^ therefore suggests that 
the optimal way to combinedifferent time samples is 


(A^N~^v)^^^ (^; Q^) ^ —i 27 r ^ cos r] sin 6 

(52) 

b 

This is still not quite a simple average in time because 
there is no choice of ^jJ{t) that can cancel out the time- 
dependence of sm{a — uj^t) for all ip and all t. Another 
way to phrase the problem is to note that even in the 
flat-sky approximation, one cannot expand Taylor ex¬ 
pand sin (a — oJ^t) over long observation times. With 
short observations, however, an expansion is justified. 
Performing this expansion, invoking the narrow field- 
of-view approximation in the a direction, and picking 
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'^(t) = —27r^cj0t gives 


tution 7/^ = uj^t — one obtains 


(AtN“V) 


flat,short 
((5,a) 


Ai/ ((5, Ct) —i27r ^ ^ cos 7] cos <5+ ^ sin 
CT^ 

/ dtVb„{t), (53) 

b •’ 


Vbu ifn) = I 

”p ^ — inip /“TT + V? 

‘ ■• • i(q—n)'ip-\-i cos (5 sin ■0 


E 


27r 


I 


d'lp e* 


.(58) 


TT — 


which is a simple averaging in time. In short, then, in¬ 
tegrating in time is an optimal way to combine time- 
ordered data if a number of criteria are met: the flat-sky 
approximation must hold (which can be achieved, for ex¬ 
ample, by having a small primary beam), the primary 
beam must track the field, the visibilities must be phased 
to track the center of the field, and the observations must 
be short. 


5.3. Optimized fringe-rate filtering for imaging 

We now proceed to derive the optimal prescription for 
combining time-ordered data, which will naturally give 
rise to fringe-rate filtering. Because we will not be in¬ 
voking the same approximations as we did in previous 
sections, we will be gin w ith Equation (47). From our toy 
example (Equation |5Q| ), we know thatlringe-rate space 
(the Fourier dual oftime) is a promising space in which 
to combine time-ordered data. Here, we make the as¬ 
sumption that we are dealing with a full sidereal day’s 
worth of data, so that the visibility in fringe-rate space 
is given by 


_ 1 

Vbu{f) = 7fr dtexp{-2Trift)Vb^{t), (54) 

Je J-T<s/2 

where / is the fringe-rate, and is the Earth’s 

rotation period. It is natural to work in fringe-rate bins 
such that the nth bin is given by fn = njT^^ where n 
is an integer. The measurement in the nth bin is then 
given by 

Vbuifn) = J 

T0 

7 2 dt cos (5sin(u;0t—a) 

X / ^ 7j^B{Y,t)e ^ ® ^(55) 

J-If. Te 

where we have temporarily omitted the additive noise 
term to avoid mathematical clutter. To proceed, we 
make some simplifying assumptions (although only some 
of which are absolutely required). First, assume that we 
are once again considering a drift-scan instrument. If the 
primary beam shape is approximately separable, we can 
then say 

B{r,t) = Bs{S)Baia- (56) 

where is a function with period 27r. Taking advantage 
of this periodicity, we can write the beam as 

B{r,t) = (57) 

Q 


where Bg = f ^Ba{o^)e^^^ is the Fourier coefficient. 
Plugging this into Equation (55) and making the substi- 


Now, the integral over 7 /^ is of a periodic function over 
one period. We may therefore freely shift the limits of 
the integral by a constant amount without affecting the 
result. In particular, we may remove the -\-ip terms in the 
limits (the only restriction being that having performed 
a (/:?-dependent shift, it is no longer legal to permute the 
various integrals), and the result is a standard integral 
form for a Bessel function J of the first kind: 


n.(/n) 


/ 


— T{v)Bs{5)e-^‘^^'^ cos,,sin6g-ina 

‘I'K 


X ^ Bg J„_q COS (5^ . (59) 


Several features are of note here. For wide primary 
beams, Bg is sharply peaked around q = 0, so the 
terms following the sum over q essentially amount to 
Jn {‘^nbo cos 6/X). Now, notice that the argument of 
the Bessel function is bounded, always lying between 
±27r6o/A. For large n (high fringe-rate bins), then, one 
can use the small argument asymptotic form for J^, 


which is a sharply decreasing function of n for large n. 
This means that there must be very little sky signal at 
high fringe-rate bins, which is yet another reflection of 
the salient feature that we have emphasized through¬ 
out this paper: high fringe-rate bins constitute noise- 
dominated modes, since the Earth’s rotation rate works 
in conjunction with the baseline length to impose a maxi¬ 
mum fringe-rate for sources locked to the celestial sphere. 

Putting together the optimal prescription as we did 
above, we have 

^ cos? 7 sin^ 

V J6,a 27 rcr 2 ^ 

b,n 

X^B^Jn.g (?^COsb VbAfn). (61) 
q ^ ^ 


Jn 


27r6o cos 6 
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I 

n\ 


irbo cos S 
A 


(60) 


In words, this recipe instructs us to move into 
fringe-rate space (where the sky emission is already 
concentrated in /^) and to further downweight by 
Bq Jn-q COS (5), which, as we have argued above, 
is small for high fringe rates. Thus, filtering away the 
high fringe-rates is the optimal way to combine time- 
ordered data from an interferometer. 

In summary, we see that downweighting high fringe- 
rates once again appears as an integral part of an optimal 
recipe, this time for mapmaking. While we made some 
assumptions (such as a separable beam) for analytic con¬ 
venience, our qualitative conclusions are general. To gen¬ 
eralize our treatment, one can simply return to Equation 
(55), numerically evaluating the integral over time. One 
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can then read off an expression for A from the remaining 
integral, and proceed as before. 

6. CONCLUSION 

In this paper, we have revisited the concept of filtering 
the visibility time-series meas ured by an interferometri c 
baseline that was presented in Parsons & Backer (2009). 
Using a mapping between the timescale of variation m 
visibility data and position on the sky for a chosen base¬ 
line, we have seen that the rectangular time windows 
typically used when integrating visibilities are almost al¬ 
ways sub-optimal (particularly for wide-held, drift-scan 
instruments), and motivate hltering on the basis of fringe 
rate as a step for optimally combining time-ordered visi¬ 
bility data. In Section we found that fringe-rate hlter¬ 
ing also naturally arises as part of optin iized mapmakin g 
prescriptions such a s those desc ribed in|Tegmark|(|1 997|) 


Morales & Matejek (2009b), or Uillon et al. 


We also showed that fringe-rate hltering can alter¬ 
nately be interpreted as a per-baseline operation for 
sculpting the primary beam along one dimension. Us¬ 
ing analytic derivations and simulations, we highlight 
several important applications of such beam sculpting. 
One key application for 21 cm cosmological experiments 
starved for sensitivity is the ability to re-weight visibility 
data according to the signal-to-n oise ratio in each fringe- 
rate bin. As shown in Section |4.I[ pre-processing the 
visibilities in such a way allows an optimal (minimum 
variance) estimate of the power spectrum to be formed 
by squaring visibilities time-slice-by-time-slice before av¬ 


eraging the results together. This somewhat surprising 
result allows a data analyst to deal directly with visi¬ 
bilities until the hnal squaring step, allowing for cleaner 
diagnoses of instrumental systematics while avoiding po¬ 
tential analysis-induced systematics associated with grid- 
ding data on the uv plane. Other important applications 
include improving the match between polarization beam 
to reduce polarization leakage, and down-weighting ar¬ 
eas low in the primary beam to reduce systematics from 
off-a xis foregrounds. 

In Ah et al. (2015), the fringe-rate filtering techniques 
presented here are applied to observations from the PA¬ 
PER array as part of their power-spectrum analysis 
pipeline. The results highlight the power of fringe-rate 
filtering in 21 cm cosmology applications. Given its effi¬ 
ciency, flexibility, and close alignment with the natural 
observing basis of radio interferometers, we anticipate 
that fringe-rate Altering may be an important analysis 
tool for current 2Icm experiments, as well as future in¬ 
struments such as the Hydrogen Epoch of Reionization 
Array (HERA; |Pob er et al.||2QI 4|) and the Square Kilo¬ 
metre Array (S kA; |Carilli||2OTf| r 
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